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Abstract 

Multi-User Detection is fundamental not only to cellular wireless communication but also to Radio- 
Frequency Identification (RFID) technology that supports supply chain management. The challenge of 
Multi-user Detection (MUD) is that of demodulating mutually interfering signals, and the two biggest 
impediments are the asynchronous character of random access and the lack of channel state information. 
Given that at any time instant the nimiber of active users is typically small, the promise of Compressive 
Sensing (CS) is the demodulation of sparse superpositions of signature waveforms from very few 
measurements. This paper begins by unifying two front-end architectures proposed for MUD by showing 
that both lead to the same discrete signal model. Algorithms are presented for coherent and noncoherent 
detection that are based on iterative matching pursuit. Noncoherent detection is all that is needed in 
the appUcation to RFID technology where it is only the identity of the active users that is required. 
The coherent detector is also able to recover the transmitted symbols. It is shown that compressive 
demodulation requires ©(if log A'' (r + 1)) samples to recover K active users whereas standard MUD 
requires N{t + 1) samples to process N total users with a maximal delay r. Performance guarantees are 
derived for both coherent and noncoherent detection that are identical in the way they scale with number 
of active users. The power profile of the active users is shown to be less important than the SNR of 
the weakest user. Gabor frames and Kerdock codes are proposed as signature waveforms and numerical 
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examples demonstrate the superior performance of Kerdock codes - the same probabihty of error with 
less than half the samples. 

Index Terms 

multi-user detection, asynchronous random access, sparse recovery, iterative matching pursuit, Gabor 
frame, Kerdock code 

I. Introduction 

Demodulation of mutually interfering signals, or Multi-User Detection (MUD) is central to multiaccess 
communications [1]. It includes the special case of the "on-off" Random Access Channel (RAC) [2] that 
arises in modeling control channels in wireless networks, where active users transmitting their signature 
waveforms can be modeled as sending I's to the Base Station (BS), and inactive users can be modeled as 
sending O's. It also includes the special case of the Radio-Frequency Identification (RFID) system [3] that 
arises in supply chain management, where each RFID tag is associated with a unique ID and attached to 
a physical object. In large scale RFID applications, an RFID reader interrogates the environment and all 
tags within its operational range can be modeled as sending I's, and tags outside its operational range 
can be modeled as sending O's. It also includes the special case of neighbor discovery in wireless ad-hoc 
networks [4], [5], where neighbors of a query node transmitting their identity information can be modeled 
as sending I's, and nonneighbors can be modeled as sending O's. In all examples, the received signals 
are possibly corrupted by noise. 

State-of-the-art random access protocols, such as IEEE 802. 1 1 standards, rely on retransmission with 
random delays at each active user to avoid collisions. This accumulates to significant delays as the size of 
the networks becomes large, for example the scale of RFID tags can easily grow to millions in practice. 
Therefore it is of great interest to allow multiple active users transmit simultaneously and still be able 
to recover the active users albeit collisions. The MUD problem becomes the recovery of the active 
users, and it may be expanded to demodulation of transmitted symbols from each active user in cellular 
communications. The two biggest impediments are the asynchronous character of random access and the 
lack of Channel State Information (CSI) at the receiver. The signature waveforms of different users are 
obtained by modulating a chip waveform using a digital sequence of length L. The total number of users 
N is severely constrained if all signature waveforms are orthogonal, giving the relationship N < L. In 
this paper we are interested in both coherent detection when CSI is known and noncoherent detection 
when CSI is unknown, under the conditions that the signature waveforms are nonorthogonal and the 
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delays of each user are unknown. 
A. Main Contributions 

Our contributions in this paper are three-fold. Given that at any time instant the number of active users 
K is typically small, the promise of Compressive Sensing (CS) [6], [7] is the demodulation of sparse 
superpositions of signature waveforms from very few measurements. A baseline architecture for MUD 
is correlation of the received signal with a bank of matched filters [1], each with respect to a shift of a 
signature waveform. The first drawback is the huge number of required filters, thus the required number 
of samples, when the number of total users N is large, which is Nt = iV(r + 1) where r is the maximum 
delay. A second drawback is that the noise will be colored and amplified by the cross-correlations of 
selected signature waveforms. An alternative baseUne architecture is sampling the received analog signal 
directly at the chip rate [8]. This approach does not amplify the noise but it does require a high-rate 
Analog-to-Digital Converter (ADC). 

We first demonstrate two front-end architectures for compressive demodulation which can lead to 
mathematically equivalent discrete signal models. The first architecture is based on subsampling the 
received signal uniformly at random, which reduces the required rate of ADC in [8]. The second 
architecture is based on a bank of generalized matched filters, which is the extension to asynchronous 
communication of the architecture for synchronous MUD proposed by Xie et. al. [9] based on analog 
compressed sensing [10]. The novelty is that both architectures are unified under the same discrete signal 
model, and further reduce the number of acquired samples M to be smaller than the length of the 
signature waveforms L. 

Second we present architectures for coherent and noncoherent detection, designed to recover active 
users and transmitted (QPSK) symbols when the CSl is known, and to recover active users when the 
CSI is unknown. Both algorithms are based on iterative matching pursuit [11] and assume a flat-fading 
channel model so that each active user arrives at the receiver on a single path with an unknown delay. 
We note that the generaUzation to a small number of arrival paths with a prescribed delay pattern is 
straightforward. Noncoherent detection is more pertinent to applications like RFID and wireless ad hoc 
networks, where only identification of active users is of interest. Our main theoretical contribution is 
relating the probabiMty of error for the proposed MUD algorithms to two geometric metrics associated 
with the set of subsampled signature waveforms. These metrics, the worst case and average coherence, 
were introduced by Bajwa et. al. in the context of model selection [12]. We provide explicit performance 
guarantees in terms of these coherence metrics and the distribution of received signal powers. These 
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fundamental limits quantify robustness of the compressive MUD algorithms to the "near-far" problem 
[1] in multiple access communications. It is shown our proposed compressive MUD algorithms require 
0{K log Nt) samples to recover K active users for both coherent and noncoherent detection, whereas 
standard MUD requires N^- samples. We further show that the minimum signal-to-noise ratio dictated by 
the weakest active user, rather than the power profile of all active users, plays an important role in the 
performance of the proposed iterative algorithms; therefore power control is less critical. 

Finally, we propose deterministic designs of cychc-extended signature waveforms that satisfy both 
the geometric metrics linked to the decoding algorithms and the block-circulant structure due to cyclic 
extensions from the asynchronous character. Gabor frames and Kerdock codes are considered due to 
their optimal coherence properties proved in [12], [13], and in this paper we extend this analysis to the 
uniformly random subsampled Gabor frames and Kerdock codes. Gabor frames are block circulant from 
its construction as a time-frequency expansion of a seed sequence. The Kerdock code is an extended cyclic 
code over Z4 (Section IV, [14]) and can be arranged to exhibit a block-circulant structure. We demonstrate 
through numerical simulations that the performance of the proposed compressive MUD algorithms using 
Gabor frames and Kerdock codes. The superior performance of Kerdock code is emphasized for practical 
interests, which can obtain the same probability of error with less than half the samples. 

B. Relationship to Prior Work 

Here we describe how this paper differs from previous papers that have also formulated MUD as a 
compressive sensing problem. The focus of most prior work is on synchronous communication, including 
[2], [4], [5], [9], [15], [16]. In [2], Fletcher et. al. studied MUD in the context of on-off RACs; in [4], [5], 
Zhang et. al. studied MUD in the context of neighbor discovery in wireless ad hoc networks; in [9], Xie 
et. al. studied MUD with simultaneous symbol detection in cellular communications. The synchronous 
model provides insight into what might be possible but it ignores the difficulty in estimating the delays 
of individual users and in achieving synchronization. 

A more general asynchronous model is considered by Applebaum et. al. in [8]. These authors assume 
synchronization at the chip or symbol level, different signature waveforms arrive with different discrete 
delays in some finite window, and the receiver uses convex optimization to recover the constituents of the 
sparse superposition. Thus users are associated with a Toeplitz block in the measurement matrix populated 
by allowable shifts in the signature waveform. In this paper we introduce a cyclic prefix in order to create 
a measurement matrix with a block cyclic structure which makes it easier to design codebooks using 
Gabor frames and Kerdock codes. 
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The algorithms presented in this paper are based on iterative matching pursuit and for uniformly 
random delays the number of samples they require is of the same order, 0{K\ogNT-), as the number 
required by the convex optimization algorithm presented in [8]. This scaling is a significant improvement 
over the Reduced-Dimension Decision Feedback (RDDF) detector described in [9] which requires order 
0{K^\ogNr) samples. The reason that we are able to break the square-root bottleneck is that by 
introducing more sophisticated coherence metrics we are able to treat average case rather than worst 
case performance. These methods may be of independent interest. Note also that the complexity of our 
algorithms are significantly less than that of of convex optimization when the set of active users is highly 
sparse (K « Nt) [17]. Moreover, it is possible to further reduce the complexity by terminating the 
algorithm early and obtaiiung partial recovery of active users. When the channel is known at the receiver 
we also improve upon the transmission rate reported by Xie et. al. [9] by incorporating complex channel 
gains in our model and moving from BPSK to QPSK signafing. 

Our focus on deterministic signature waveforms is different from most previous work [2], [4], [5] which 
considers random waveforms. The fact that random waveforms can be shown to satisfy the Restricted 
Isometry Property [6] makes analysis possible but they are not very practical. The same criticism can 
be leveled at the RDDF detector described in [9] where randomness enters the choice of the coefficients 
determining the filter bank. Randomness also enters into [5] through the pattern of puncturing of Reed- 
MuUer codewords which serve as deterministic signature waveforms. 

C. Organization of this paper and Notations 

The rest of the paper is organized as follows. Section 11 describes the system model, and Section 111 
presents two architectures for the compressive MUD front-end. Section IV proposes the coherent and 
noncoherent detectors, along with their performance guarantees. Section V proves the main theorems. 
Section VI presents the design of signature waveforms based on Gabor frames and Kerdock codes. Section 
VII shows the numerical simulations and Section Vm concludes the paper. 

Throughout the paper, we use capital bold letters A to denote matrices, small bold letters a to denote 
vectors, || and ||a||p to denote the p-norm of A and a, where p = 2 or oo. J^v denotes the identity 
matrix of dimension N, t denotes pseudo-inverse, A^ denotes the Hermitian of A, and c* defines the 
conjugate of a complex number c. 
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II. System Model 

Consider a multi-user system of N total user where the nth users, n = 1,---,N, communicate using 
spread spectrum waveform of the form 

L-l 

Xn{t) = ^/PnY.('nMt-^T,), te[0,T), (1) 
e=o 

where p(t) is a unit-energy pulse / \p(t)\'^dt = 1, / p*it - £Tc)p(t - kTc)dt = 0, (•)* denoting the 
conjugate operation, for i i= k. The chip duration Tc determines the system bandwidth, T is the symbol 
duration, P„ denotes the transmit power of the nth user, and the spreading codeword 

an = [an,o - an,L-iy, n = l,-,N, (2) 

is the L-length (real- or complex- valued) codeword of unit energy || an || 2 = 1 assigned to the nth user. 
Typically L < N. The notation ^ denotes transpose of a matrix or vector. 

To simpUfy the model, we consider a one-shot model, where the user sends one symbol at a time 
rather than sending a sequence of symbols. The signal at the receiver is given by 

N 

y(t) = Z 9n\/PnS{neT}bnXn(t - T^) + w(t), (3) 

n=l 

where Qn ^ C and r,' e M+ are the channel fading coefficient and the continuous delay associated with 
the nth user, respectively. Define the power profile of all users as r = [ri,---,rjv]^, where 

Tn - 9n\[Pn- (4) 

The power profile is determined by the power control at the transmitter and the chaimel coefficients 
during transmission, which could take complex values. 

We assume Quadrature Phased Shift Keying (QPSK) modulation, where 6„ e {(-1 - j)/^/2, (-1 + 
j)/^/2, (1 - j)/\/2, (1 + j)l\/2} is the transmitted symbol of the nth user, and w{t) is a complex 
additive white Gaussian noise (AWGN) introduced by the receiver circuitry with zero mean and variance 
(Tq. Denote by X the set of active users. We assume the support of active users T is a uniform random 
if-subset of [AT] = {1, . . . , N}. The Dirac function = 1 if x is true and (5^ = otherwise. 

Define the individual discrete delays r„ = [r^/TcJ 6 Z+, and the maximum discrete delay r = max„ r„ 6 
Z+. While the values of r„ are unknown, r is assumed to be known by the transmitters and receivers. 

Each vectors a„ is the cyclic prefix of a vector d„ of length P = L - (r + 1). As shown in Fig. 1, a„ 
is obtained by appending the first r + 1 symbols of a„ to the end of d„, we have a„_^ = a„ p_T-_^+i for 
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1,...,T + 1. As a result, any length P sub-sequence of the vectors a„ will be a cyclic shift of d„. 




Fig. 1: Illustration of the cyclic prefix in the construction of spreading codewords. 

III. Compressive MUD Front-End 

In this Section we describe two front-end architectures for compressive MUD. The first is the chip- 
rate subsampling architecture considered in [18]; and the second is the asynchronous case of a bank of 
generalized matched filters architecture considered in [9]. We begin by showing that mathematically, the 
two front-end architectures are equivalent. 

A. Chip-rate subsampling architecture 

The chip-rate subsampling architecture directly samples the continuous received signal at the chip rate 
using a high-rate ADC as shown in Fig. 2 (a). The receiver only starts sampling when the waveforms of 
all active users have arrived. Starting at sample (r + 1), it collects M uniformly random samples over 
a window of length L. These samples, or linear combinations thereof constitute the measurements made 
by the receiver. We assume the codewords are of a reasonable length relative to the delays such that 
L > M. As a result, the output data vector can be written as 

y = HInARb + w, (5) 

where y e C^^*^^, A e C^^^^, and the noise w e C^^**^ is complex Gaussian distributed with zero mean 
and variance a^HH^ . The subsampling matrix is defined as Jf^ e R^^^^, where Vt denotes indices of 
samples, and H e C^^*^^^ is a matrix that linearly combines the samples. The columns of matrix A have 
a block structure with each block consisting of circulant shifts of a codeword. Define a circulant matrix 

An. as 

= [ Tofln TlCln ■■■ TrCln 6 C^''^^^^^ (6) 

where the notation 7fc denotes the circulant shift matrix by k, and 

A = [Ai - AjvleC^"^^ (7) 
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The vector b e contains the transmitted symbols; it is a concatenation of N vectors b'^ of length 
r + 1, each with at most one non-zero entry at the location of r„: 

The entries Rmm of the diagonal matrix R e are a function of the channel gain, the transmitted 
power, and the transmitted symbols: 

n^l,...,N, m^O,...,Nr-l. 



y(t)- 




(a) chip-rate subsampling 




(b) a bank of generalized matched filters 

Fig. 2: Illustration of two architectures: (a) the chip-rate subsampling architecture, and (b) a bank of 
generalized matched filters architecture, where the first block is a linear filter with impulse response 

6{t + TT,). 



B. A bank of generalized matched filters 

A generalized matched filter for compressive MUD [9] correlates y{t) with a set of signals {/im(i)}m=i' 
as shown in Fig. 2 (b). The measurement is taken by multiplying a delayed version of y{t + tTc) with 
hm{t), and integrating over a window of length T - tTc, where T is the symbol period. The output of 
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the mth measurement is given by 

-T-rT, 



ym = f^ ° Kn{t)y{t + TTc)dt 

= {hm{t),y{t + TTc)), m=l,-,M. 



Writing this in a vector notation, we have 

y = BRb + w, 

where 



B = [Bl,•••,BJv]€C^'<^^ (9) 

with Bn e C^x(^+i), for n = 1,-,N. The (m,£+ l)th entry of Bn is given by 



e = 0,--,T, m = l,-,M. 
The noise vector t« is a M-dimensional complex Gaussian vector with zero mean and covariance matrix 

T—tT 

= al£ ° hUt)hk{t)dt. (10) 

We now parameterize for the generalized matched filters {^m(*)}m=i- ^ [^1' matched filters are 
constructed as linear combinations of the bi-orthogonal signals of the user signature waveforms. Here we 
consider a more general construction that can lead to a discrete model equivalent to that of the chip-rate 
subsampling (5). Assume the measurement signals are constructed using the chip waveform and chip 
sequences as 

L 

hmit)-T.hmMt'^T,), (11) 

e=o 

where 

hm=[hm,o — hm,L-iY, m=l,—,M, (12) 
is the L-length (real- or complex-valued) codeword for the mth measurement signal. By this parameter- 
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ization, for i = 0, r, 



T—tT 

= ^ h*^{t)Xn{t+{T-£)T,)dt 

= llllKn,uan,v r p*{t-uT^)p{t+{T-l-v)T^)dt 
u=Oi;=0 -'O 

L 

= E Kn,u['^lO"n\u = h^iTlttn), 

u=0 



(13) 



where we have used /q " P*{t- uTc)p{t + (r - Z - v)Tc)dt = <5{u=r-z-j;} • Hence, from (9) and (13), we 
obtain that 



where 



B = [bi - Bn] = HA, 

h^{%an) ■■■ hf{Trdn) 



(14) 



The noise in the mth measurement is given by {hjn{t),w{t)), which is a complex Gaussian random 
variable with zero mean and covariance matrix (10) given by [X)]mfe = (^o^m^k- Define a matrix 

Substituting (14) into (9), we obtain that when the filters {hm{t)} are parameterized by (11), the 
measurement vector can be written as 

y = HARb + w, (15) 

where w - CM{0,a^HH^) is a complex Gaussian random vector with zero mean and covariance 
matrix a^HH^. Given the output (15) of the bank of generalized matched filters, there are two special 
cases for H: 

• H = HIq, then HH^ = HH^, which means the output (15) of the second architecture is 
mathematically equivalent to the chip-rate subsampling architecture (5). 

• In the first architecture, if we choose to be an orthogonal matrix, then HH^ = Im, the output 
signal power of each measurement is M/N and the noise power is (Tq. The signal-to-noise ratio per 
measurement is M/{NaQ). 
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• In the second architecture, if we choose if to be a tight frame, then HH^ = {N/M)Im- For each 
measurement, the output signal power is 1, and the noise power is {N/M)aQ. The signal-to-noise 
ratio per measurement is MI{N(Jq). 
Table 1 is a summary of the comparison between these two architectures when HH^ - Im and 
HH^ = {N/M)Im, where Im is the identity matrix of dimension M. Note that both architectures lead 
to the same discrete signal model (15). In the following, we will focus on signal recovery and signature 
waveform designs based on (15). 



TABLE I: Comparison of the two architectures. 



Architecture 


Chip-rate 


Generalized matched 




subsampling 


filter bank 


# of Users 


N 


N 


# of Filters 


1 


M 


# of Samples 


M 


1 


Sampling Rate 


{N/M)T, 


T {T » Tc) 


Signal Power 


M/N 


1 


Noise Power 


-I 


{NlM)al 


SNR per 


M/N 


M/N 


measurement 







IV. Coherent and Noncoherent Detection Algorithms 

In the following sections, we choose if as a tight-frame, and hence the noise is white and we assume 
the noise variance is cr^ = N/Ma^. Define 

X = HA=[xi,-,XN^]€C^''^-. (16) 

We further assume that the columns of H and A are scaled so that each column of X is unit-norm: 
||£c„||2 = 1. Hence the model (15) becomes 

y = XRb + w, (17) 

where w ~ CM{Q,o^Im)- Based on this model, we first present a coherent matching pursuit detector 
based on iterative thresholding to detect active users and their transmitted symbols, when R is assumed 
known. We also present a noncoherent matching pursuit detector to detect active users when R is assumed 
unknown, which is adapted from the Orthogonal Matching Pursuit (OMP) algorithm [11]. 
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A. Coherent and Noncoherent Matching Pursuit Detector 

The coherent matching pursuit detector is described in Algorithm 1. With knowledge of the number 
of active users K, the algorithm performs K iterations. In each iteration, Algorithm 1 first finds a user 
with the strongest correlation with its delayed signature waveforms, then subtracts its exact contribution 
to the received signal and updates the residual. Since we assume a flat-fading channel^ there is only 
one nonzero entry in each user block. Therefore, in the next iteration we can restrict our search to the 
remaining users. To find the transmitted symbols of each active user, we adopt simple quadrant detectors 
as in (18) and (19). Our algorithm doubles the rate of the modulation scheme in [9] by considering the 
complex nature of the power profiles. 



Algorithm 1 Coherent Matching Pursuit Detector for Asynchronous MUD 

1: Input: matrices X and R, signal vector y, number of active users K 

2: Output: active user set X, transmitted symbols b 

3: Initialize: Xq := empty set, bo •= 0^ ■■- y, Xq :={!,..., iV^} 

4: for = to K - 1 do 

5: Compute: / := X^Vk 

6: Find i = argmaxn^^-fc |/n| 

7: Detect active users: X^+i = X^ u {[i/(r + 1)]} 

8: Update: Xk+i = Xk\{[iKT + l)J(r + 1) + 1, \i/{T + 1)]{t + 1)} 

9: Detect symbols: 

9\{[bk^i]i} = ^sgn($K[</^]), (18) 

3{[bk^,]i} = ^sgni3[r;fi]), (19) 

where sgn is the sign function, and 9^(x) and 3{x) takes the real part and imaginary part of x 

respectively. 
10: Update 6: [bk+i]n = [bk]n for ni^i. 
11: Update residual: Vk+i = Vk- XRbk+i 
12: end for 
13: X = Ik, b = bK 



The noncoherent matching pursuit detector is described in Algorithm 2. We denote Xx the submatrix 
(subvector) consisting of columns (entries) of X indexed by X. Given one symbol, it is not possible to 

resolve the ambiguity in channel phase. Algorithm 2 detects whether a user is active or inactive, and does 
not recover the transmitted symbols. The residual is updated by subtracting the orthogonal projection of 
y onto the signal space of the detected users. The noncoherent detector is appropriate for the situation 

'Our results can be easily generalized to a multipath channel model. 
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where we do not have access to the channel state information and are only interested in detecting the 
active users. For example, it is more pertinent in applications like RFID where it is only important to 
register the presence or absence of a tag. 



Algorithm 2 Noncoherenl Matching Pursuil Deleclor for Asynchronous MUD 

1: Input: matrix X, signal vector y, number of active users K 

2: Output: active user set I 

3: Initialize: Xq := empty set, Vq y, '•- ■ ■ ■ , ^t} 

4: for = to K - 1 do 

5: Compute: / := X^Vk 

6: Find i = argmaXneA-J/nl 

7: Detect active users: Ik+i - '^k^ {[^/(''" + 1)1} 

8: Update: X^^i = Xk\{[il{T + 1)J(t + 1) + 1, R/(r + 1)1(t + 1)} 

9: Update residual: Vfe+i = y- Xx^^.X^^^^y. 

10: end for 

11: I^Ik 



The complexity of the coherent detector is lower than that of the noncoherent detector, since no 
orthogonahzation is necessary to update the residual. In both detectors, it is possible to terminate the 
algorithm early and obtain partial recovery of active users^. 

B. Performance Guarantees 

The performance guarantee for the two algorithms are expressed in terms of two fundamental metrics 
of coherence of X. The first is the worse-case coherence: 

fj,(X) = max £c„ Xm\, (20) 

which is widely used in characterizing the performance of sparse recovery algorithms. The second is the 
average coherence, defined as 



JS-j- - 1 n 



(21) 



where 1 is an all-one vector. 

We say that a matrix X satisfies the coherence property if the following two conditions hold: 

m(X)< ^ :, j.(X)<^. (22) 

' V21ogiV/ ^ ^ ^/M 

^The performance guarantees can be easily generalized to partial recovery. 
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In addition, we say that that a matrix X satisfies the strong coherence property if the following two 
conditions hold: 

li{X)< ^ , KX)<^^^. (23) 

' 240 log AT/ ^ ^ ^/M 



Note that the condition on average coherence y{X) < ij,(X)/\/M can be achieved with essentially no 
cost via "wiggling", i.e. flipping the signs (or phases) of the columns of X, which doesn't change the 
worst-case coherence fi{X) and the spectral norm ||X||2 [19]. For simplicity we shall write /x = At(^) 
and u = iy{X). 

We sort the amplitude of the entries of an K-sparse vector r, |r„| from the largest to the smallest for 
the active users and denote as . . . , Let 



Flmin = \r\(K)- 

We define the nth Signal-to-Noise Ratio (SNR„) and the nth Largest-to- Average Ratio (l_AR„) as 
SNR„=— , LAR„ = ^^M^, n = l,...,K. 

The Signal-to-Noise Ratio (SNR) and minimum Signal-to-Noise Ratio (SNRmin) are defined respectively 
as 



I T* II ^ I ^1 2 



min 



We then have the following performance guarantee for the coherent matching pursuit detector. 

Theorem 1. Suppose that N-r = N{t + 1) > 128, that the noise w is distributed as CJ\f{0,a^lM), and 
that X satisfies the coherence property. If the number of active users satisfies 



K<m.m.\ — 1 (24) 



for c = 20y/2, and if the power profile of active users satisfies 

for l<k<K, then Algorithm 1 satisfies 

Pr{6#6} < (4 + 7r-^)Ar;\ 
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Since LAR(fe) > LAR(;^) for 1 < A; < (25) can be satisfied if 

lAR 8 ( KlogNr \ 

Let 9 - c[i\J K log Nt e (0, 1), then (26) implies that the number of active users is bounded by 

^ M(l-9)2SNR^in 

SlogAT^ 

Combining this with (24), we have the following corollary. 

Corollary 2. Suppose that Nt = N{t + 1) > 128, that the noise w is distributed as CM{0,a'^lM), and 
that X satisfies the coherence property. We write fi = ciM'^l'^ for some ci > fci may depend on N^- 
and 7 6 {0} u [2, oo)j. Then Algorithm 1 satisfies Pr{6 # 6} < (4 + Tr~^)N~^ as long as the number of 
active users K satisfies 

( M M(l-e)2SNR^i„ e^M^'-^X 

iv<maxmm-! , >, (27) 

o<e<i \21ogAr^' SlogAT^ 'cpogiV^J' 

where C2 = 20V2ci. 

We have the following performance guarantee for the noncoherent matching pursuit detector. 

Theorem 3. Suppose that = N{t + 1) > 128, that the noise w is distributed as CA/'(0, ct^/m), and 
that X satisfies the coherence property. If the number of active users satisfies 

K<mm\^, — ^ , „ ^ I (28) 

for C3 - 50^2 and C4 - 104^2, and if the power profile of active users satisfies 

LAR,,, > ' . , (29) 

a- C3^i^iK -k + 1) log Nr^ \ MSNR I' 

for l<k<K, then Algorithm 2 satisfies 

Pr{X # X} < {Ktt-^ + Q)N-^. 
Similarly, using the fact that LAR(fe) > LAR(^k) for 1< k < K, (29) can be satisfied if 

^^^^^'(1-C3mV1^I^'(^^)' ^^^^ 
and the following corollary becomes straightforward. 
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Corollary 4. Suppose that Nj- = N{t+ 1) > 128, that the noise w is distributed as CM{0.,(t^Im), and 



where C3 = 50\/2ci and 04 = 104\/2. 

Theorem 1 and Theorem 3 implies that with both coherent and noncoherent detectors, the system can 
support K - 0{M /log N-r) users with M samples. In other words, the system can support K users with 
M ~ 0{KlogNT-) samples. Since both detectors are based on iterative thresholding, the power profile 
of different active users, defined in (4), enters the analysis only through the quantities l_AR(fe), and plays 
a less important role than SNRmim the SNR of the weakest active user in determining the performance. 
Performance of the two algorithms is identical in terms of scaling. 



Central to the proof is the notion of {K, e, (5)-Statistical Orthogonality Condition (StOC) introduced 
in [12], which can be related to the worst-case and average coherence of matrix X. We prove that the 
probability of error is vanishingly small if with high probability X satisfies the StOC and the noise w 
is uniformly bounded. 

A. Preparations 

We first introduce an alternative way to represent the measurement model. We can write the vec- 
tor of transmitted symbols together with the power Rb as a concatenation of a random permutation 
matrix and a deterministic iC-sparse vector z 6 C^^. The form of the K-sparse vector is given by 
z = [zi,---,Zk,0,---,OY . Let n = (7ri,---,7rAr^) be a random permutation of {NtJ. Let Pjj be a A?,- x iV,- 
permutation matrix, and Pjj = [cttj , • ■) e^r^^]^, with e„ being the nth column of the identity matrix 
In^. Given this notation, the assumption that X is a random subset of [A^rl is equivalent to stating that 
z - PfiRb. Hence the measurement equation (17) can be written as 



that X satisfies the coherence property. We write H = ciM-i/T for some ci > fci may depend on N^- 
and 7 6 {0} u [2, 00 )j. Then Algorithm 2 satisfies PT{i + X} < {Ktt~^ + 6)N~^ as long as the number 
of active users K satisfies 




(31) 



V. Proofs of Main Theorems 



y = XRb + w = XPf^z + w = XuVx + w, 



(32) 
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where 11 = (7ri,---,7rfe) denotes the first k elements of the random permutation fl, and Xu denotes the 
M X K sub-matrix obtained by collecting the columns of X corresponding to the indices in 11, and 
the vector rx e represents the K nonzero entries of Rb. We next define the {K,e, 5) -Statistical 
Orthogonality Condition (StOC). 

Definition 1 (StOC). Let Yl be a random permutation of [AT^]. Define 11 = (7ri,---,7rx) and He = 
(ttt^+i, TTiv^) for any K 6 [1,A^t-]. Then, the M x N^- (normalized) matrix X is said to satisfy the 
{K,e,S) -statistical orthogonality condition if there exists e,6 € [0, 1) such that the inequalities: 



\(xEXn-lK)zU<e\\z\\2 (StOC-1) 
\\XK Xuz\\oo < e\\z\\2 (StOC-2) 



(33) 
(34) 



hold for every fixed z 6 C with probability exceeding 1- S with respect to the random permutation 11. 

The StoC property has proved useful in obtaining average case performance guarantees [12], [20]. It 
is similar in spirit to the Restricted Isometry Property (RIP) [6] which provides worst case guarantees in 
CS. An important difference between the two is that while we know of no effective algorithm for testing 
RIP, it is possible to infer StOC from matrix invariants that can be easily computed. 

If (33) and (34) hold for a realization of permutation fl, then for 1< k < K, let 11^ = (tti, . . . , nk) and 
IIj = (vTfc+i, . . . , ttk), so that Ut u IIJ = 11 and 11^0 11^ = 0. For every z e C*^, we have from (33) that 



^ut -^Hfe X^c Xjii - iK-k 



Therefore ||(^n - Ik)z\\oo < e\\z\\2, and \\X^aXui,z\\oo < e\\z\\2- Moreover, from (34) we have 



[x^cXn, X^cXni] 



< ^z\\2■ 



We also need the following two lemmas. 



Lemma 1. An M x A^^ matrix X satisfies (K, e, 5) -StOC for any e e [0, 1) and a > 1 with 

r .Ar / {e-^/kvf \ 
^^^^---H" l6(2.a-i)v )' 

as long as K < min{e^i^"^, (1 + a)~^Nr). 



(35) 



The proof for this lemma can be found in [12]. A consequence of this lemma is that if we let K < 
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M/{2\ogNr) and fix e = 10^x^2 log AT^, then the matrix X satisfies {K,e,S)-StOC with S < 4N-^. 
Define the event Qi as: 

Gi^iX satisfies StOC-1 and StOC-2}. (36) 

Then Qi occurs with probabihty at least 1 - 4N~^ with respect to fl given the aforementioned choice of 
parameters. 

In order to prove Theorem 3, we need an argument due to Tropp [21] that shows a random submatrix 
of X is well-conditioned with high probability. We follow the treatment given by Candes and Plan [22] 
where this argument appears in a slightly different form, given below. 

Lemma 2 ([21], [22]). Let II = (tti, . . . ,77^^) be a random permutation of [A^r]. '^nd define U = 

(Tri,...,TTK) for any Ke [l,iV^]. Then for q ^ 2log Nr and K < Nr/(4:\\X\\l), we have 



{E[\\X^Xu-lKn]) 

I 

<2i/9 



1/9 



2K||X||2logiV^'^ 



30/ilogiV^ + 13\ — — 

with respect to the random permutation fl. 



(37) 



The following lemma [22] states a probabilistic bound on the extreme singular values of a random 
submatrix of X, by applying the Markov inequality to Lemma 2: 

Pr ( II X^Xn - /x II 2 > 1/2) < 2«E [ || X^ Xn-IxWl] 

Lemma 3 ([22]). Let li = (tti, . . . , ttjv^) be a random permutation of {NtJ, and define H = (tti, . . . , ttk) 
for any K < Nt. Suppose that ji < 1/(240 log AT^) and K < Nt j {(^\X\1^\ogNr) for numerical constant 
C2 = 104^2, then we have 

V.(\\X^Xn-lKh>]^<2p-''''^\ 

Define the event 

g2 = {||Xn^n-/x||2<l/2}, 

which happens at least 1 - 2N^^^°^'^ > 1 - 2N~^ with respect to fl from Lemma 3. Notice that all 
the eigenvalues of X^Xu are bounded in [1/2,3/2]. Under Q2, we have ||(X^Xn)~^||2 ^ 2 and 
||Xn(-X'n -X'n)"^||2 < n/2. Moreover, for 1 < A; < if and 11^ = (tti, . . . ,7rfc), we have \\X^^Xuf, - Ikh < 
1/2, since eigenvalues of X^ Xu,^ are majorized by eigenvalues of X^X^j [23]. 
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Finally, we need that the noise is bounded with high probabihty. 

Lemma 4. Let P & C^""^ be a projection matrix such that = P. Let w ~ CM{0,a^I), and X be 
a unit-column matrix. Then for t > we have 

Pr(||X^P«;||^ < r) > 1 - ^e'^^', 

TT 

provided the right hand side is greater than zero. 

Proof: See Appendix VIH-B. ■ 
Now let T = (T\/21ogiV7, and define 

Uq = {\\X"Pw\\oo<t}. 
It follows from Lemma 4 that T-Lq occurs with probabihty at least 1 - n'^Nr^ 



T 



B. Proof of Theorem 1 

When applied to Algorithm 1, the next lemma shows that under appropriate conditions, ranking the 
iimer products between a;„ and y is an effective method of detecting the set of active users. 

Lemma 5. Let b be a vector with support X corresponding to K active users, and let y be a noisy 
measurement as in (17). Suppose that 

|r|(i)-2e||rx||2>2T. (38) 
Then, if the event Gi n Hq occurs, we have 

max \x^y \ > max \x^y\. (39) 
and sgn{d\[rlxly]) = n/21H[6„J, sgn{3[r*^xly]) = V2J[bn,],for 

I J-f I 

ni = argmax|cc„ y\. (40) 

n 

Proof See Appendix VIH-C. ■ 
We now prove the performance guarantee for the coherent detector in Algorithm 1 . First we show that 
under the event Qi n Hq, which happens with probability at least 1 - (tt"^ + 4) AT:^, Algorithm 1 correctly 
detects all active users and symbols in the first K iterations. Define a subset II/j which contains the k 
variables that are selected until the kth iteration, 0< k < K. 



March 20, 2013 



DRAFT 



20 



We want to prove oHhy induction. First at A; = 0, Ho = c 11. Suppose we are currently at the 
kih iteration of Algorithm 1, < k < K - 1, and assume that Hk c n. The A;th step is to detect the user 
with the largest \x^Vk\. We have 

Vk = X(b-bk) + w = Xrjk + w, (41) 

where rjk-b- b^. This vector has support 11^ = n\nfe and has at most (K-k) non-zero elements, since 
bk-i contains correct symbols at the correct locations for k active users, i.e. = [b]n, for n e IIj.. 

This Vk is a noisy measurement of the vector Xrjk- The signal model in (41) for the A;th iteration is 
identical to the signal model in the first iteration with b replaced by rjk (with a smaller sparsity K-k 
rather than K), H replaced by H^, and y replaced by v^. Hence, from Lemma 5 we have that under the 
condition 

||rxc||^-2e||rxc||2>2r, (42) 

we have 

max \x^Vk\ > max \x^Vk\- (43) 

i.e. Algorithm 1 can detect an active user correctly, and no index of an active user that has been detected 
before will be chosen again. Note that ||rio||oo > Iki^b ^ \/K - A;|r|(fe+i), (42) is satisfied by 

> 2eVif-A;|r|(fe+i) + 2r. 

Since K < l/(c^/x^ logiV,-) and e = 10fi\/T[ogN^, this is equivalent to the condition in (25) for < < 
K -1, therefore a correct user is selected at the kth iteration, so that n^+i c n. On the other hand, since 
condition (38) is true, the symbol can be detected correctly as well. Then we have that under the event 
QiHUo, sgn{VK[r*xly]) = n^1H[6„J, sgn(a[r;^<y]) = n/2J[6„J, that is QinHo c {b^^J = 6„J. 
By induction, since no active users will be detected twice, it follows that the first K steps of Algorithm 1 
can detect all active users. 

C. Proof of Theorem 3 

We note that in Algorithm 2, the residual Vk, k = O,---, K - 1 is orthogonal to the selected columns 
in previous iterations, so in each iteration a new column will be selected. Define a subset 11^ which 
contains the k variables that are selected until the kth iteration. Then = Xu^iX^^Xn^y^ X^^ is the 
projection matrix onto the hnear subspace spanned by the columns of -X'n^, and we assume Pq = 0. 
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Again we want to prove Iljt c n by induction. First at A; = 0, Hq = c 11. Assume at the kih iteration, 
Ilfecn, < k < K - 1, then the residual can be written as 

Vk = (I-Pk)y 

= (/ - Pk)Xnrx + {I - Pk)w = Sfe + 

where Sk = {I- Pk)Xjirx and Uk = {I - Pk)w are the signal and noise components respectively at the 
kih iteration. 

Let = \\X^Sk\\oo, M^, - ||X^cSfc||oo and N'' = \\X^nk\\,K,, then a sufficient condition for Ilfe+i c 
n, i.e. for Algorithm 2 to select a correct active user at the next iteration is that 

- M^e > 2N^ (44) 

since under (44) we have 

II ^felloe > - iV^ > M^^+N^ > \\XgvkU. 

Let the event 

From Lemma 1 and Lemma 3 the event G holds with probability at least 1 - AN~^ - 2Ar~^'°^^ with 
respect to fl. 

Now we bound and M^^ under the event Q. Let 11^ = n\nfe be the index set of yet to be selected 
active users, and rn= = ^1= be the corresponding coefficients. We can find a vector z of dimension 
{K - k) such that X^^Xu^z = X-^c (I - Pk)Xurx, where the vector z can be written as 

Since we have 

||z||2< ||(X^cXnc)-^||2||Xf.(I-Pfc)Xn=rx.||2 (45) 
< 2 II X^Xn II 2 II II 2 < 3 II ric II 2 , (46) 

where (45) follows from 

||X^.(/-Pfe)Xnc)||2<||X#Xn||2, 
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whose proof can be found in [24], and (46) follows from Lemma 3. Also, 

II -f fe-^n= rxc II 
= W^ni^nA^u^^nJ'^X^^Xuirx- \\oo 
^ e||(^n,^nj"^^n,^n=ric||2 

^ 4 (-^Hfc -^Hfe ) '"^ II 2 II Xui II 2 II rxc II 2 

^ e||ric||2, 
therefore can be bounded as 

> ||rx.||oo - ||(X^.Xn= -/)rxc||oo - \\xS^PkXn^^rx^Joo 

> ||ric||oo-2e||ric||2. (47) 
where (47) follows from (33). Next, M^, can be bounded as 

M^. = \\Xi(.(I-Pk)XurTU 

- Il-^n<=-^n^'2||oo 

<e||z||2<3e||ric||2. (48) 

where (48) follows from (34). 

Conditioned on the event Q, for each P^, since / - is also a projection matrix, define the event 

nk = {N^<T}, k = 0,-,K-l. (49) 

Then from Lemma 4, Hk happens with probabihty at least 1 - Tr~^N~^ with respect to w. We further 
define the event H = h^q^'Ha;, then from the union bound Pi{n\g) = Pr('H) > 1 - Kir-^N''^. 

Under the event Q nT-L, from the above discussions which happens with probability Pr(^ n "H) > 
1 - Kt:~^N~^ - 2Nr^^°^'^ - 4N~'^ > 1 - {Ktt~^ + 6)N~^. Now we are ready to analyze the performance of 
Algorithm 2 under the event Q nH. Substituting the bounds (47), (48) and (49) into (44), it is sufficient 
that at the A;th iteration 

||ric||^ >5e||ric||2 + 2T. (50) 
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Note that ||ri=||oo > kl(fe+i), Iki^b < A;|r|(fe+i), (50) is satisfied by 

\r\{k+i) > ^eV K - k\r\(^k+i) + 2r. 

Since K < l/(cf/x^ logiV,-) and e = 10^^/2 log A^, this is equivalent to the condition in (29) for < A; < 
K - 1, therefore a correct user is selected at the kth iteration, so that lik+i c H. Since the number of 
active users is K, Algorithm 2 successfully finds 11 in if iterations under the event QriH, and we have 
proved Theorem 3. 

VI. Deterministic Signature Waveforms 

A. Gabor Frames 

In the following we will construct the signature sequences a„ from Gabor frames. Let e be a 
seed vector with each entry \gn\'^ - 1/M and let T{g) € C^^^ be the circulant matrix generated from g 
as T{g) = [Tog ■■■ T-g]. Its eigen-decomposition can be written as 

T{g) = Fdiag(F^g)F^ = Fdiag(g)F^, 

where F = ^[a;o,aJi,---,u;p_i] is the DFT matrix with columns 

Define corresponding diagonal matrices Wm = diag[a;m], for m = 0, 1, . . . , P - 1. Then the Gabor frame 
* = [(prn] generated from gf is an P x block matrix of the form 

* - [WoT{g), WiT{g), Wp.iT{g)]. (51) 

where each column has norm yJP/M. When we apply the DFT to the Gabor frame and obtain 
^ = F^^, the order of time-shift and frequency modulation is reversed, and therefore # is composed 
of circulant matrices after appropriate ordering of columns. In fact, if we index each column m from P^ 
to P X P by m = Pq + £, the matrix is obtained by keeping all columns with r - £ (mod P). So 
can be written as 

^i = VP-diagiS^g)F, 
where S is the right-shift matrix by one, and its DFT transform 

= F^e = VPT{Weg) 
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is a circulant matrix. We use *p_i] as the matrix A. 

At the receiver, a random partial DFT is apphed to the received symbol, so if = is a partial DFT 
matrix, and the resulted matrix X. = is a subsampled Gabor frame defined in (51), with unit-norm 
columns. The maximum discrete delay r which this Gabor frame construction can support is P-1, where 
W£g can be assigned as signature sequences to a user, so the maximum number of total users should 
satisfy iV < P. In general, if r < P - 1, we can spht ^£ into blocks to support multiple users, and send 
7d(T-+i)W^g as signature sequences for d = O,---, [P/{t + 1)J and £ = 1,---,P, so the maximum number 
of total user satisfies N < P[P/{t+1)\ in general. 

Now we consider the coherence properties of X. We have the following proposition. 

Proposition 1. Let X be a unit-column matrix with M rows subsampled uniformly at random from a 
Gabor frame # that satisfies the strong coherence property. If M > j log P for some constant 7, then 
with probability at least 1 - 2P~^, we have fi{X) < 72/ log P /or some constant 72, and iy{X) <2/M 
deterministically. 

Proof: See Appendix VIII-D. ■ 
It is established in [12] that Gabor frames satisfy the strong coherence property when the seed 
sequence is the AUtop sequence, or with high probability when the seed sequence is randomly generated. 
Proposition 1 imphes that we can find an M such that the subsampled Gabor frame satisfies the (strong) 
coherence property as long as M is not too small. 

B. Kerdock Codes 

The set of Kerdock codewords is given as columns of the matrix 4f e {±l,±j}^'^^ , where P = 2™. 
Since the Kerdock code is a cyclic extended code over Z^, we can find a map of the columns of Kerdock 
code into P blocks (Theorem 10, [14]), such that the P - 1 columns within each block are cyclic. Then 
we can assign adjacent codewords in the same cyclic block to one user, and set the first code as user's 
transmitted codeword. We denote the final code book as * e {±l,±j}^^^^ and we denote the 
discarded P columns by the set 

The coherence property of the subsampled Kerdock code set is summarized in the Proposition below. 

Proposition 2. Let X be a unit-column matrix with M rows subsampled uniformly at random from a 
Kerdock code ^. If M > 7 log P for some constant 7, then with probability at least 1 - 2P" , we have 
/i(X) < ^-2 1 log P for some constant 72, and v{X) < 21 M deterministically. 
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Proof: See Appendix VIII-E. ■ 
The worst-case coherence of * meets the Welch bound /Lt(*) = \I\/P and the average coherence of 
^ is v{^) - l/P. Proposition 1 implies that we can find an M such that the subsampled Kerdock code 
set satisfies the (strong) coherence property as long as M is not too small. 

VII. Numerical Examples 

A. Gahor Signature Waveforms 

We first consider when each circulant matrix in the Gabor frame supports only one user. This corre- 
sponds to the maximum delay the algorithm can work in the asynchronous case. Let the seed vector g 
for the Gabor frame be either an AUtop sequence of length P = 127, given as 

or a unit vector with random uniform phase of length P = 128, given as 

3 1 [e^-2.ei,e^-2'^^S...,e^'2'^^-], (52) 
yP 

where 9i is uniformly distributed on [0, 1], l<i< P. The power profile is assumed known as r^i — 1 for 
all n = I, --, A?^ in the coherent case, and are assume unknown in the noncoherent case. 

The active users are selected first by uniformly choosing a number at random from 1 to P, and then, for 
each active user, the delay is chosen uniformly at random. First, we fix the number of active users, namely 
K = 2, and apply the coherent detector described in Algorithm 1 and noncoherent detector described 
in Algorithm 2 for SNR = 20dB and SNR = 40dB. The partial DFT matrix is applied with randomly 
selected rows and the number of Monto Carlo runs is 5, 000. Fig. 3 shows the probability of error for 
multi-user detector with respect to the number of measurements. The performance of the AUtop Gabor 
frame is better than that of the random Gabor frame due to its optimal coherence. It is also worth noting 
that the performance of the noncoherent detector is almost the same as that of the coherent detector, 
albeit it does not perform symbol detection. This may suggest that channel state information and power 
control are less important in sparse recovery of active users. 

Finally, we consider when the maximum delay is relatively small, for example r = 15 when P = 128 for 
a random Gabor frame. We transmit the first sequence within the block of the circulant matrix, resulting 
in a total number of P^/{t + 1) = 1024 users, and Fig. 4 and Fig. 5 show the probability of error 
for multi-user detection with respect to the number of active users K for different number of random 
measurements M = 40,60,80 when SNR = 20dB and SNR = 40dB respectively. 
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Fig. 3: Probability of error for multi-user detection with respect to the number of measurements from 
coherent and noncoherent detectors using (a) an AUtop Gabor frame with length P - 127, and (b) a 
random Gabor frame with length P - 128, for K - 2 active users and SNR = 20dB and 40dB, where 
the maximum chip delay is r = 126. 




Number of Active Users Number of Active Users 



(a) Coherent detector (b) Noncoherent detector 

Fig. 4: Probability of error for coherent and noncoherent multi-user detection with respect to the number 
of active users using a random Gabor frame with P - 128 for M - 40, 60, 80 when SNR = 20dB, where 
the maximum chip delay is r = 15. The total number of users is = 1024. 



B. Kerdock Signature Waveforms 

We first generate a Kerdock code set ^ of length P - 128 with P^ codewords. By removing the 
all-one row in and removing two column in each block of size P, we obtain a block-circulant matrix 
of size (P-l)xP(P-2), where there are P circulant blocks of size (P-l)x(P-2). As earlier, we 
assume the maximal delay is r = 15, the total number of users is given as [(P - 2)/(r + 1)J • P = 896. 
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Fig. 5: ProbabiUty of error for coherent and noncoherent multi-user detection with respect to the number 
of active users using a random Gabor frame with P - 128 for M = 40, 60, 80 when SNR = 40dB, where 
the maximum chip delay is r = 15. The total number of users is = 1024. 



Fig. 6 show the probability of error for multi-user detection with respect to the number of active users 
K for different number of random measurements M = 20,40,60 when SNR = 20dB. 




Number of Active Users Number of Active Users 



(a) Coherent detector (b) Noncoherent detector 

Fig. 6: Probability of error for coherent and noncoherent multi-user detection with respect to the number 
of active users K using a Kerdock code set with P - 127 for M = 20,40,60 when SNR = 20dB, where 
the maximum chip delay is r = 15. The total number of user is = 896. 



C. Comparison of Signature Waveforms 

In this section we compare the performance of different signatures for multi-user detector when SNR = 
20dB and K = 2. We use the above considered Kerdock code, AUtop Gabor frame and random Gabor 
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Signatures 


# of total users 


Kerdock 


896 


Random Block 


1024 


AUtop Gabor 


889 


Random Gabor 


1024 



TABLE II: Total number of users for different signatures. 



frame when P = 128. We also consider the cyclic extensions of random matrix whose columns are 
generated from (52). Table II summarizes the total number of users for different signature waveforms, 
notice that both Kerdock and Alltop suffer from the floor operation in calculating the number of total 
users. As shown in Fig. 7, the performance of Kerdock code is significantly better than other choices. The 
performance of cyclic extensions of random matrices and Gabor frames are similar, since the subsampling 
degenerates the optimal coherence properties of the unsampled Gabor frame. The Alltop Gabor frame is 
slightly better than its random counterparts. 




Number of Measurements Number of Measurements 



(a) Coherent detector (b) Noncoherent detector 

Fig. 7: Comparison of performance with respect to the number of measurements for multi-user detection 
when K - 2 and SNR = 20dB, where the maximum chip delay is r = 15. 



VIII. Conclusions 

This paper describes two MUD front-end architectures that lead to mathematically equivalent discrete 
signal models. Both coherent and noncoherent detectors based on iterative matching pursuit are presented 
to recover active users, and their transmitted symbols are also detected in the coherent case. It is shown 
that compressive demodulation requires 0{K log N-r) samples to recover K active users. Gabor frames 
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and Kerdock codes are proposed as signature waveforms and numerical examples are provided where 
the superior performance of Kerdock code is emphasized. The resihence of iterative matching pursuit to 
variability in relative strength of the entries of the signal might be an advantage in multi-user detection in 

wireless communications because it makes power control less critical. We make the final remark that the 
noncoherent detectors can be extended to detect transmitted symbols by assigning two different signature 
waveforms to the BPSK signaling. 

Appendix 

A. Sidak's lemma 

Lemma 6 (Sidak's lemma). [25] Let be a vector of random multivariate normal variables 

with zero means, arbitrary variances af, and and an arbitrary correlation matrix. Then, for any 

positive numbers ci,---,c„, we have 

Pr(|Xi| < ci,-, \Xn\ <Cn)>f\ Pr(|Xi| < a). 

i=l 

B. Proof of Lemma 4 

Since w ^ CM {0 , cf"^ I m) , X^Pw - CM{0,a'^X^ PX) but it is a colored Gaussian noise. We 
want to bound Pr(||X^Pi(;||oo > r) for some r > 0. Note that each cc^Piy ~ CAr(0, a^), where 
= a'^x^Pxn < (recall that ||a;„||2 = 1). Then 

TT \ 1 I 1 [ 2 I 2 

Pj:{\x^Pw\ < r) = 1 - -e"^ '"-^yl- -e"^ . 

TT TT 

Following Lemma 6, for r > we have 

Pr{\\X^Pw\\oo < t) > n Pr{\XnPw\ < r) 

n=l 

>(1- V>^)^^>l-^e->^ 

TT TT 

provided the right hand side is greater than zero. 
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C. Proof of Lemma 5 

We begin by deriving a lower-bound for max„ex l^nVl when Qi n Tio occurs. Assume that no is the 
index achieving the largest absolute gain: Ir^ol = |?"|(i)- Then under the event Qi nTio'- 



max\x^y\ > \x^y\ 



H , 



no 



> r 



(1) 



-\XnW\ 



= kl(i)-|l(^n^n-I)ri||-KH 
> kl(i)l -ekib -r. 



(53) 



On the other hand, we can similarly expand and upper-bound max„^i \^nyV under the event Q U Qx, 



as 



I H I 

max y\ = max 

niX niX 



< max 

niX 



EH H 



+ max \Xr, w\ 



= llXnc-^n^lloo +max|a;^ti;| 

niX 



< e F"! 2 + T. 



(54) 



Combining (53) and (54), we have that under the event Qi n Ho, 



max|a;^y| > |r|(i) - 2e||ri||2 - 2t + m.ax\x^y\. 

neX njfl 



(55) 



So when Q occurs, under the condition (38), we obtain (39), as required. 

Furthermore, to detect correctly, for 1H[6„] = 1/^2, yi[r^ x^ y] has to be positive, and for lH[6n] = 
-1/^2, ^R[r^x!^y] has to be negative. Similarly we can detect 3[6n]- First assume m[bn] = then 

nviy] 

= \rnf+ Z ^R[bmW[r*^JmX^Xm]+^[rnXi^w] 

must be positive. Suppose this does not hold, and fH[r*^a;^^t/] < 0. Recall that no is the index of the 
largest gain: \r\na - kl(i)- From (40), we have 



(56) 
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Since 



* H \ I |2 u * H * H 

< kni|(e||ri||2 + t), 

where (57) follows from 5H[r*^ x^^ y] < 0. Similarly to earlier derivations, we have 



(57) 



rl,x^,y\ > knj(|r|(i) - e||ri||2 - r), 



(58) 



we have that once (38) holds, |r*^a;^^y| > \rn^x!^^y\, which contradicts (56), then sgn(9^[r*^a^^y]) = 1. 
A similar argument can be made for £H[6„J = -l/\/2 and the cases associated with J[6„J, which 
completes the proof. 

D. Proof of Proposition 1 

Denote the index set of subsampled rows of the Gabor frame as A. Let (t)m{i) be the zth entry of 0^, 
the coherence between two distinct columns of X is given as m m', 

{Xm,Xm') = Y,^m(^)^m'(i), 
ieA 

with the expectation K{xjn,Xm') = {(t>m-,4>m')^ whose absolute value is upper bounded by /x(*), the 
worst case coherence of Applying the triangle inequality and the Hoeffding's inequality [26] we have 
for 7 > 0, 

/ 7^M\ 
)| - /i(*) > 7} < 4exp , 

Now we consider all pairs of different inner products and apply the union bound. 



Pr {/x(X) - /x(*) > 7} < 2P\P^ - 1) exp 
Let 7 = \/ '^^^M^ ' ^^^^ probability at least 1 - 2P"^, we have 



/ 72m\ 



//(X)<m(*) + 



20 log P 
M 



(59) 
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If the Gabor frame satisfies the coherence property such that /Lt(*) < 71/ log P for some constant 71, 
then by choosing M > j\og^ P, we have fj,(X) < 72/ log P for some constant 72 with probability at 
least 1 - 2P-\ 

We next consider the average coherence of X. Let m - Pq + r, m' - Pq' + r', we have 

Y, {Xm,Xm')=Y, E (t>*m{i)(t>m'{i)- 

Since each column in a Gabor frame can be written as, 

where (1 - r)p = mod(l - r, P). If r r', we have 

p-i 

E E ^mii)(t>m'{i) = -Pff(l-r)^ E 5(l-r-')p " '^{i=l} 
q'=0 r+r' r^r' 

\f r = r', q + q' , we have 

X <t>'Lmm'{.i) = -J- [(P - 1) • %=l} - , 

where we use the fact \gi\^ = 1/M. To sum up, we have 

m'+m 

-1/M i # 1 

Then \'Zm'*ni{xm, Xm')\ < ^^llfflli + 1 = + 1' t^c average coherence can be bounded 

deterministically as 

P^ + M 1 2 

^ ^ P2-l M M 

E. Proof of Proposition 2 

The analysis of the worst-case coherence is exactly as in the proof of Proposition 1 hence is not 
repeated. Regarding the average coherence, the columns in the Kerdock set ^ form an abelian group Q 
under point-wise multiplication. By the fundamental group property, if every row contains some entry 
not equal to 1, then the column group Q satisfies Tjg^gg - 0. Let Xm{i) and Vm(0 be the ith entry 
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respectively of Xm and ipm- When ii=0, the subsampled Kerdock set X then satisfies 



Y, {Xmii),Xm'(i)) 






and 



E (a;m(0),x^,(0)) 



P-1 



M 



Then the average coherence is bounded as 



1 



P^ + M-2 2 



(P2_1)M - M' 



< — 



p2_l 



m'+ra 
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